SOME PROBLEMS IN ASYMPTOTIC CONVEX 
GEOMETRY AND RANDOM MATRICES MOTIVATED BY 
NUMERICAL ALGORITHMS 



ROMAN VERSHYNIN 

Abstract. The simplex method in Linear Programming motivates sev- 
eral problems of asymptotic convex geometry. We discuss some conjec- 
tures and known results in two related directions - computing the size 
of projections of high dimensional polytopes and estimating the norms 
of random matrices and their inverses. 



I. ASYPTOTIC CONVEX GEOMETRY AND LINEAR PROGRAMMING 

Linear Programming studies the problem of maximizing a linear func- 
tional subject to linear constraints. Given an objective vector z G M d and 
constraint vectors a 1; . . . , a n G M. d , we consider the linear program 

maximize (z, x) 

(LP) 

subject to (a,j, x) < 1, i — 1, . . . , n. 

This linear program has d unknowns, represented by x, and n constraints. 
Every linear program can be reduced to this form by a simple interpolation 
argument [36]. The feasible set of the linear program is the polytope 

P := {x G M. d : {a h x) < 1, i = l,...,n}. 

The solution of (LP) is then a vertex of P. We can thus look at (LP) from 
a geometric viewpoint: 

for a polytope P in ¥L d given by n faces, and for a vector z, 
find the vertex that maximizes the linear functional (z,x). 

The oldest and still the most popular method to solve this problem is 
the simplex method. It starts at some vertex of P and generates a walk 
on the edges of P toward the solution vertex. At each step, a pivot rule 
determines a choice of the next vertex; so there are many variants of the 
simplex method with different pivot rules. (We are not concerned here with 
how to find the initial vertex, which is a nontrivial problem in itself). 
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1.1. The shadow- vertex pivot rule and sections of polytopes. The 

most widely known pivot rule maximizes the objective function (z, x) over 
the neighboring vertices. The resulting walk on the vertices is defined iter- 
atively, and thus is usually hard to analyze. An alternative shadow-vertex 
pivot rule [10] defines a walk on the polytope P as a preimage of a projeciton 
of P. The resulting walk can be desciribed in a non-iterative way, so one 
hopes to analyze it with the methods of asymptotic convex geometry. 

Suppose we know a solution x of (LP) for some other objective vector 
Zq. The shadow-vertex simplex method interpolates between zq and z by 
computing the solutions of (LP) for all z' in the plane 

E = span(z , z). 

From a geometric viewpoint, we consider the orthogonal projection Q(P) 
of the feasible polytope P onto E. It is easily checked that the vertices xq 
and x of P will be preserved by the projection: Q(x) and Q{xq) will be 
vertices of the polygon Q(P). 

The shadow-vertex simplex method thus computes the vertices of the 
polygon Q(P) one by one, starting from Q(x ) and ending with Q(x). So 
at the end it outputs x, which is the solution of (LP). One can express 
the computation of Q(x) as a pivot rule, and check that each next vertex 
can be computed in polynomial time. The resulting walk on the polytope 
P is therefore the preimage of the vertices of the polygon Q(P) under the 
projection Q. 

It will be convenient to work in the dual setting. The polar of P is 

K := P° = {x e R d : (x, y) < 1 for all y G P} = conv(0, a u . . . , a n ) 

and the polar of the projection Q(P) is the section K n E. The length of 
the walk in the shadow-vertex simplex method is thus bounded by the size 
(the number of edges) of the polygon K D E. 

1.2. Complexity of the simplex method and the size of sections. 

The running time of the simplex method is proportional to the length of the 
walk on the edges of P it generates. Hirsch's conjecture states that every 
polytope P in IR d with n faces has diameter at most n — d. The diameter 
is the maximum of the shortest walk on the edges between any pair of the 
vertices. The best known bound on the diameter is n log2<i+2 due to Kalai 
and Kleitman |15j . 

For every known variant of the simplex method, an example of (LP) is 
known for which the length of the walk on P is not polynomial in n and 
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For the the classical (maximizing) pivot rule, such an example was first 
constructed by Klee and Minty [IS]: on a certain deformed cube, the walk 
visits each of the 2 d vertices |18j . 

Similar pessimistic examples are known for the the shadow-vertex sim- 
plex method: the size of the planar section K D E that bounds the length 
of the walk is in general exponential in n, d. This follows for example from 
the seminal construction in semidefinite programming by Ben-Tal and Ne- 
mirovski [5], which yields a polytope K and a plane E such that the section 
K R E is an approximation of the circle with error exponentially small in 
n, d. 

Problem 1.1 (Sections of polytopes). Let K be a polytope in M. d with n 
vertices, and E be a two-dimensional subspace ofW d . Estimate the size (the 
number of edges) of the polygon K D E. Under what conditions on K and 
E is this number polynomial in n,d? 

This problem is somewhat opposite to the typical problems of the asymp- 
totic convex geometry, whose ideal would be to produce the most round sec- 
tion (fine approximation to a cicrle). In Problem 11.11 our ideal is a section 
with fewest edges, thus farthest from the circle. From the viewpoint of the 
simplex method, "round" polytopes have high complexity, while polytopes 
with fewest faces have low complexity. 

1.3. Smoothed analysis and randomly perturbed polytopes. De- 
spite the known examples of exponentially long walks, on most problems 
that occur in practice the simplex algorithm runs in polynomial and even 
linear time. To explain this empirical evidence, the average analysis of the 
simplex method was developed in the eighties, where the (LP) was drawn 
at random from some natural distribution and the expected size of the walk 
was shown to be polynomial in rz, ^ [H [281 EQl [2H [121 HI E31 [3l [2] . 

In particular, Haimovich showed ([12] . see [26], Section 11.5) that if one 
chooses the directions of the inequalities in (LP) uniformly at random as 
< or >, then the expected length of the walk in the shadow- vertex simplex 
method is at most d/2. Note that the size does not depend on the number 
of inequalities n. 

However, reversing inequalities is hard to justify in practice. Spielman 
and Teng [32] proposed to replace average analysis by a finer model, which 
they called smoothed analysis, and where the random inputs are replaced 
by slight random perturbations of arbitrary inputs. Smoothed analysis thus 



Recently, a prandomized polynomial time pivoting algorithm for (LP) was found by 
Kelner and Spielman [16 . However, their algorithm generates a walk on some other 
polytope related to (LP) and not on P. 
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interpolates between the worst case analysis (arbitrary inputs) and the av- 
erage analysis of Smale (random inputs). 

Spielman and Teng [31] first showed that the shadow-vertex simplex 
method has polynomial smoothed complexity. If the polytope K is ran- 
domly perturbed, then its section K D E will have an expected polynomial 
size (which in turn bounds the length of the walk in the simplex method). 
Their result was improved in [7] and in [36], and the current best bound is 
as follows: 

Theorem 1.2. [36J Let be independent Gaussian vectors in M. 

with centers of norm at most 1, and whose components have standard devi- 
ation a < logn. Let E be a plane in W 1 . Then the random polytope 
K = conv(ai, . . . , a n ) satisfies 

(1.1) E | edges(K n E)\ < Cd 3 a~\ 

where C is an absolute constant. 

The prior weaker bound of Spielman and Teng [31 J was Cnd 3 a~ 6 ; the sub- 
sequent work of Deshpande and Spielman [7] improved upon the exponent 
of d but doubled the exponent of n. 

Theorem 11.11 shows that the expected size of the section K C\E is polylog- 
arithmic in n, while the previous bound were polynomial in n. Going back 
to the pre-dual polytope P, this indicates that random perturbations of the 
polytopes create short walks between any two given vertices. Note that for 
large n, this polylogarithmic bound becomes better than the bound n — d 
in Hirsch's conjecture. 

Theorem 11.11 provides a solution to Problem 1 1.1 1 for a randomly perturbed 
polytope and a fixed subspace. A seemingly harder problem, which is still 
open, is for an arbitrary polytope and a randomly perturbed subspace. This 
version would be significant for the analysis of the simplex method, because 
it allows one to leave the constraints intact and only perturb the objective 
function. 

Another open problem is to estimate the diameter of randomly perturbed 
polytopes, rather than all polytopes as in Hirsch's conjecture. 

Problem 1.3 (Spielman- Teng [31J). Let K be a perturbed polytope as in 
Theorem \1. 6 A Estimate the expected diameter of P = K° . Is it always 
polynomial in n, d and a? Perhaps even polylogarithmic inn? 

Finally, no analog of Theorem 11.21 is known for bounded perturbations, 
i.e. for for dj = a« + o~8i, where a$ are arbitrary fixed vectors of norms at 
most 1 and 9 are independent vectors chosen from {—1, l} d or from [—1, 1] 
uniformly at random. Such bounded smoothed analysis is a common model 
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for roundoff errors when real numbers are represented as binary numbers in 
computers [TT] . 

1.4. Nondegeneracy of faces and invertibility of random matrices. 

The approach to Theorem 11.21 developed by Spielman and Teng [31] is based 
on the intuition that most faces of K should be non-degenerate simplices 
(e.g. they have inscribed balls of polynomial radii). If the plane E intersects 
such a nondegenerate simplex F, the length of the interval E n F is likely 
to be polynomially big (if the plane intersects a simplex, it is likely to pass 
through its "bulk" rahther than touch the boundary only). 

On the other hand, with high probability all vectors in the the perturbed 
polytope K have norms O(logn). (Its vertices are Gaussian perturbations 
of n vectors of norm at most 1). Therefore, the perimeter of the polygon 
K PI E can be at most O(logn). Since all edges E n F of this polygon are 
polynomially big, we conclude that are at most polynomially many edges, 
as desired. 

There are several places where this approach breaks down or is not known 
to succeed. One such problem is the non- degeneracy of the faces. The 
nondegeneracy of a simplex S is usually quantified with the smallest singular 
value of the matrix A that realizes the change of the basis from the standard 
simplex to S. For the polytope K = conv(a 1; . . . , a n ), each face is a simplex 
with vertices (aj) ig / for some cf-element subset I C {1, . . . , n}. If the vertices 
aj are Gaussian as in Theorem 11.21 the change of basis is a d x d matrix 
with Gaussian independent entries. Thus we need the random Gaussian 
matrices to be far from being singular. Quantitative theory of invertibility 
of random matrices is the subject of the next section. 

2. Invertibility of random matrices 

For a one-to-one linear operator A : X — > Y between two normed spaces 
X and Y, two quantities are central in functional analysis: the norm \\A\\ 
and the norm of the inverse ||v4 _1 ||. If the operator is not onto, then the 
inverse norm is computed for the restriction of A onto its image; so we 
identify WA^W with Thus 

11-411 = sup nj=iu = inf W Ax W- 

xeX:\\x\\=l \\ A II x£X: ||x||=l 

The operator A can be viewed as realizing an embedding of the space X into 
the space Y, and the product ||A|| ||^4 _1 || is the distortion of the embedding 
(see [13]). 

The canonical example is when both X and Y are finite dimensional Eu- 
clidean spaces, say X = IR fc , Y = M. n , where we identify the linear operator 
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A with its k x n matrix. The singular values of A are the eigenvalues of 
\A\ = V A* A, the largest and the smallest singular values being 

A m ax(A) = \\A\\, A m j n (A) = || • 

In the numerical linear algebra and scientific computing literature, the dis- 
tortion 

k(A) = A max (A)/A min (A) = || A|| || A" 1 ! 
is commonly called the condition number of A. 

We are interested estimating these quantities for random matrices A. 
For one reason, random matrices sometimes provide an intuition for what 
to expect in practice; we saw such reasoning about average analysis and 
smoothed analysis of the simplex method in the previous section. Random 
linear operators with controllable distortion (or their adjoints) also serve 
as handy tools in most randomized constructions in geometric funcitonal 
analysis [7J , geometric algorithms in theoretical computer science [34"| [35] , 
compressed sensing in information processing (jS], [5J), vector quantization 
[T5] and some other fields. 

2.1. Gaussian matrices. We start from the simplest case of a Gaussian 
matrix, those whose entries are i.i.d. standard normal random variables. 
The asymptotics of the largest and the smallest singular values is well un- 
derstood in this case: for a n x d Gaussian matrix A with n > d, one 
has 

Amax(A) ~ \pa + y/d, A min (A) « \fn — y/d with high probability. 

There is a long history of such asymptotic results. In particular, the largest 
and the smallest singular values converge almost surely to their correspond- 
ing values above as the dimension n grows to infinity and the aspect ratio 
d/n converges to a constant, see [7J. A sharp nonasymptotic result - for 
every fixed n and d - follows from Gordon's inequality (see jTJ): 

y/n - y/d < EA min (A) < EA max (A) < + Vd. 

Combining with the concentration of measure inequality, one deduces a 
deviation bound [7j: for every t > 0, with probability at least 1 — 2e~* I 2 
one has 

(2.1) y/K-Vd-t< A min (A) < A max (A) < ^ + \Td + t. 

Note that the lower bounds become meaningless for square Gaussian ma- 
trices, those with n = d. Yet this case is central in some applications: as we 
saw in Section 11.41 the square matrices determine the faces of the polytope 
K in linear programming, and nondegeneracy of such a face translates into 
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a lower bound for X m i n (A). To guess the order of the smallest singular value, 
note that for an (n-l)xn matrix, the lower bound is \fn — \/n — 1 ~ n^ 1 ' 2 . 

Von Neumann and his associates, who used random matrices as test ma- 
trices for their algorithms, indeed speculated that for random square ma- 
trices one should have 

(2.2) X min (A) ~ rT x l 2 with high probability 

(see [2Z], pp. 14, 477, 555.) In a more precise form, this estimate was 
conjectured by Smale |29j and proved by Edelman [9] for Gaussian matrices: 
for every e > 0, one has 

(2.3) P(A min (A) < en- 1 ' 2 ) ~ e. 

In particular, the smallest singular value is not concentrated: the mean and 
the standard deviation of n 1//2 A m i n (v4) are both of the order of a constant. 
This is very different from the behavior of the largest singular value, which 
by (12. ip is tightly concentrated around its mean. 

An elegant argument by Sankar, Spielman and Teng [23] generalizes (12. 3 j) 
for random Gaussian perturbations of an arbitrary matrix, i.e. for the 
smoothed analysis setting of Theorem 11.21 

Theorem 2.1 (Sankar, Spielman and Teng [25]). Let A is an n x n matrix 
with independent Gaussian random entries (not necessarily centered), each 
of variance a 2 . Then, for every e > 0, one has 

P(A min (A) < en- 1 ' 2 ) < Ce/a, 

where C = 1.823. 

In applications for random polytopes such as in Section 11.41 we need all 
faces to be nondegenerate, thus all d X d submatrices of a random n x d 
Gaussian matrix (whose rows are the constraint vectors from 
(LP)) be nicely invertible. This motivates the following problem: 

Problem 2.2. Let A be annxd Gaussian matrix (with i.i.d. standard nor- 
mal entries, or, more generally, as in Theorem \2.1\) . Estimate the expected 
minimum of the smallest singular values of all all d x d submatrices of A. 

In particular, if n = 0(d), we want this minimum to be polynomially 
small rahter than exponentially small in d. 

2.2. General matrices with i.i.d. entries. Most problems we discussed 
become much harder once Gaussian matrices are replaced with other nat- 
ural matrices with i.i.d. entries. Nevertheless, understanding of discrete 
matrices, whose entries can take finite set of values, is important in appli- 
cations such as in numerical algorithms, which can only deal with discrete 
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values. A survey on random discrete matrices was recently written by Vu 

[SB]. 

Asymptotic theory of random matrices has developed to a point where 
the behavior of the largest singular value is well understood. Suppose A is 
an n x d matrix with i.i.d. centered entries, which have variance 1. Then 
the fmiteness of the fourth moment of the entries is necessary and sufficient 
that 

(2.4) A max (A) — ► y/n + Vd almost surely 

as the dimension n grows to infinity and the aspect ratio d/n converges to a 
constant |39j. A similar statement holds for the convergence in probability, 
and with a slightly weaker condition than the fourth moment |27j . 

Under a much stronger suhgaussian moment assumption, which still holds 
discrete and gaussian random variables, a parallel non-asymptotic result is 
known (for all finite n and d). A random variable £ is called suhgaussian if 
its tail is dominated by that of the standard normal random variable: there 
exists a constant B > such that 

(2.5) P(|f | > t) < 2 exp(-t 2 /B 2 ) for all t > 0. 

The minimal B here is called the suhgaussian moment^ Gaussian random 
variables and all bounded random variables, in particular the symmetric ±1 
random variable, are examples of subgaussian random variables. Inequality 
(12.51) is often equivalently stated SIS db moment condition 

(2.6) mt\ p ) 1/p < CB^/p~ for all p > 1, 

where C is an absolute constant. 

The following non-asymptotic result follows from a more general result 
proved by Klartag and Mendelson ([17], Theorem 1.4) with constant proba- 
bility, which was later improved by Mendelson, Pajor and Tomczak-Jaegermann 
([22j. Theorem D) to an exponential probability: 

Theorem 2.3 ([T7J [22]). Let A be an n x d matrix (n > d) with i.i.d. 
centered subgaussian entries with variance 1. Then, with probability at least 
1 — Ce~ m , one has 

- C\Q < A min (A) < A max (A) < y/E + cVd, 

where C depends only on the subgaussian moment of the entries. 

This estimate approaches the sharp asymptotic bound (12.41) for very tall 
matrices (for small aspect ratios d/n). However, the lower bound becomes 
useless for the aspect ratios above some constant, and in particular says 
nothing about square matrices. 



In the literature in geometric functional analysis, the subgaussian moment is often 
called the i/^-norm. 
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Problem 2.4. Let A be an n x d matrix (n > d) with i.i.d. centered 
subgaussian entries with variance 1. Is it true that with high probability one 
has 

(A) > c{y/n- Vd), 
where c > depends only on the subgaussian moment of the entries ? 

In a positive direction, lower bounds valid for all aspect ratios y := d/n < 
1 were proved by Litvak, Rudelson, Pajor and Tomczak-Jaegermann [20] 
with an exponential dependence on 1 — y, and improved to a linear depen- 
dence by Rudelson [2"5] . 

Nevertheless, even the positive solution of Problem 12.41 would not say 
anything for square matrices, those with n = d. This problem was recently 
solved in the work [23], which confirmed prediction (12. 2p for general matrices 
with independent entries. Recall that the bounded fourth moment of the 
entries is necessary and sufficient to controll the largest singular value as in 
(12.41) . Then [24] proves that the fourth moment assumption (i.e. the fourth 
moments of the entries are uniformly bounded) is also sufficient to control 
the smallest singular value. For an n x n matrix A with random centered 
entries of variances at least 1, 

Under the fourth moment assumption, prediction (12.21) holds. 

The identical distribution of the entries is not needed in this result. 

For a stronger subgaussian assumption on the entries, prediction (12. 2p 
holds with exponentially high probability. This was conjectured by Spiel- 
man and Teng [31] for random ±1 matrices: 

P(s n (A) < en~ 1 ' 2 ) <e + c n , 

and proved in [24] in more generality - for all matrices with subgaussian i.i.d. 
entries, and up to a constant factor which depends only on the subgaussian 
moment. 

Theorem 2.5 f[24j). Let A be an n x n matrix with i.i.d. centered subgaus- 
sian entries with variance 1 . Then for every e > one has 

(2.7) P(A min (A) < en- 1 ' 2 ) < Ce + c n , 

where C > and c < 1 are constants that depend (polynomially) only on 
the subgaussian moment of the entries. 

In particular, for e = we deduce that any random square matrix with 
i.i.d. subgaussian entries with variance 1 is singular with exponentially small 
probability. For random matrices with ±1 entries, this was proved by Kahn, 
Komlos and Szemeredi [14]. For more on prior work and related conjectures 
on the singularity probability, see [38, 24J. 
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